NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Measuring the Stories in Contemporary Songs

https://doi.org/10.63744/w9C0wDxmZTVt

Bamman, David; Baur, Sabrina; Cramer, Mackenzie Hạnh; Ho, Anna; McEnaney, Tom (November 2025, Anthology of Computers and the Humanities)

Full Text Available
Once More, With Feeling: Measuring Emotion of Acting Performances in Contemporary American Film

Zhou, Naitian; Bamman, David (December 2024, Fifth Conference on Computational Humanities Research)

Narrative film is a composition of writing, cinematography, editing, and performance. While much computational work has focused on the writing or visual style in film, we conduct in this paper a com- putational exploration of acting performance. Applying speech emotion recognition models and a vari- ationist sociolinguistic analytical framework to a corpus of popular, contemporary American film, we find narrative structure, diachronic shifts, and genre- and dialogue-based constraints located in spoken performances.
more » « less
Full Text Available
Subversive Characters and Stereotyping Readers: Characterizing Queer Relationalities with Dialogue-Based Relation Extraction

Chang, Kent; Ho, Anna; Bamman, David (December 2024, Conference on Computational Humanities Research)

Full Text Available
On Classification with Large Language Models in Cultural Analytics

Bamman, David; Chang, Kent; Lucy, Li; Zhou, Naitian (December 2024, Fifth Conference on Computational Humanities Research)

In this work, we survey the way in which classification is used as a sensemaking practice in cultural analytics, and assess where large language models can fit into this landscape. We identify ten tasks supported by publicly available datasets on which we empirically assess the performance of LLMs compared to traditional supervised methods, and explore the ways in which LLMs can be employed for sensemaking goals beyond mere accuracy. We find that prompt-based LLMs are competitive with traditional supervised models for established tasks, but perform less well on de novo tasks. In addition, LLMs can assist sensemaking by acting as an intermediary input to formal theory testing.
more » « less
Full Text Available
Social Meme-ing: Measuring Linguistic Variation in Memes

Zhou, Naitian; Jurgens, David; Bamman, David (June 2024, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)

Much work in the space of NLP has used computational methods to explore sociolinguistic variation in text. In this paper, we argue that memes, as multimodal forms of language comprised of visual templates and text, also exhibit meaningful social variation. We construct a computational pipeline to cluster individual instances of memes into templates and semantic variables, taking advantage of their multimodal structure in doing so. We apply this method to a large collection of meme images from Reddit and make available the resulting SEMANTICMEMES dataset of 3.8M images clustered by their semantic function. We use these clusters to analyze linguistic variation in memes, discovering not only that socially meaningful variation in meme usage exists between subreddits, but that patterns of meme innovation and acculturation within these communities align with previous findings on written language.
more » « less
Full Text Available
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters

Lucy, Li; Gururangan, Suchin; Soldaini, Luca; Strubell, Emma; Bamman, David; Klein, Lauren; Dodge, Jesse (August 2024, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL))

Large language models’ (LLMs) abilities are drawn from their pretraining data, and model development begins with data curation. However, decisions around what data is retained or removed during this initial stage are underscrutinized. In our work, we ground web text, which is a popular pretraining data source, to its social and geographic contexts. We create a new dataset of 10.3 million self-descriptions of website creators, and extract information about who they are and where they are from: their topical interests, social roles, and geographic affiliations. Then, we conduct the first study investigating how ten “quality” and English language identification (langID) filters affect webpages that vary along these social dimensions. Our experiments illuminate a range of implicit preferences in data curation: we show that some quality classifiers act like topical domain filters, and langID can overlook English content from some regions of the world. Overall, we hope that our work will encourage a new line of research on pretraining data curation practices and its social implications.
more » « less
Full Text Available
Small Worlds: Measuring the Mobility of Characters in English-Language Fiction

Wilkens, Matthew; Evans, Elizabeth F; Soni, Sandeep; Bamman, David; Piper, Andrew (May 2024, Journal of computational literary studies)

The representation of mobility in literary narratives has important implications for the cultural understanding of human movement and migration. In this paper, we introduce novel methods for measuring the physical mobility of literary characters through narrative space and time. We capture mobility through geographically defined space, as well as through generic locations such as homes, driveways, and forests. Using a dataset of over 13,000 books published in English since 1789, we observe significant "small world" effects in fictional narratives. Specifically, we find that fictional characters cover far less distance than their non-fictional counterparts; the pathways covered by fictional characters are highly formulaic and limited from a global perspective; and fiction exhibits a distinctive semantic investment in domestic and private places. Surprisingly, we do not find that characters' ascribed gender has a statistically significant effect on distance traveled, but it does influence the semantics of domesticity.
more » « less
Full Text Available
Racial and Ethnic Representation in Literature Taught in US High Schools

https://doi.org/10.22148/001c.131682

Lucy, Li; Griffiths, Camilla; Ying, Claire; Kim-Ebio, JJ; Baur, Sabrina; Levine, Sarah; Eberhardt, Jennifer L; Bamman, David; Demszky, Dorottya (January 2025, Journal of Cultural Analytics)

We quantify the representation, or presence, of characters of color in English Language Arts (ELA) instruction in the United States to better understand possible racial/ethnic emphases and gaps in literary curricula. We contribute two datasets: the first consists of books listed in widely-adopted Advanced Placement (AP) Literature & Composition exams, and the second is a set of books taught by teachers surveyed from schools with substantial Black and Hispanic student populations. In addition to these book lists, we provide an unprecedented collection of hand-annotated sociodemographic labels of not only literary authors, but also their characters. We use computational methods to measure all main characters’ presence through three distinct and nuanced metrics: frequency, narrative perspective, and burstiness. Our annotations and measurements show that the sociodemographic composition of characters in books recommended by AP Literature has not shifted much for over twenty years. As a case study of how ELA curricula may deviate from the curricula prescribed by AP, our teacher-provided sample shows a greater emphasis on books featuring first-person, primary characters of color. We also find that only a few books in either dataset feature both White main characters and main characters of color. Arguably, these books may uphold a view of racial/ethnic segregation as a societal norm.
more » « less
Full Text Available
Dramatic Conversation Disentanglement

Chang, Kent; Chen, Danica; Bamman, David (July 2023, Findings of the Association for Computational Linguistics: ACL 2023)

We present a new dataset for studying conversation disentanglement in movies and TV series. While previous work has focused on conversation disentanglement in IRC chatroom dialogues, movies and TV shows provide a space for studying complex pragmatic patterns of floor and topic change in face-to-face multi-party interactions. In this work, we draw on theoretical research in sociolinguistics, sociology, and film studies to operationalize a conversational thread (including the notion of a floor change) in dramatic texts, and use that definition to annotate a dataset of 10,033 dialogue turns (comprising 2,209 threads) from 831 movies. We compare the performance of several disentanglement models on this dramatic dataset, and apply the best-performing model to disentangle 808 movies. We see that, contrary to expectation, average thread lengths do not decrease significantly over the past 40 years, and characters portrayed by actors who are women, while underrepresented, initiate more new conversational threads relative to their speaking time.
more » « less
Full Text Available
Grounding Characters and Places in Narrative Text

Soni, Sandeep; Sihra, Amanpreet; Evans, Elizabeth; Wilkens, Matthew; Bamman, David (July 2023, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics)

Tracking characters and locations throughout a story can help improve the understanding of its plot structure. Prior research has analyzed characters and locations from text independently without grounding characters to their locations in narrative time. Here, we address this gap by proposing a new spatial relationship categorization task. The objective of the task is to assign a spatial relationship category for every character and location co-mention within a window of text, taking into consideration linguistic context, narrative tense, and temporal scope. To this end, we annotate spatial relationships in approximately 2500 book excerpts and train a model using contextual embeddings as features to predict these relationships. When applied to a set of books, this model allows us to test several hypotheses on mobility and domestic space, revealing that protagonists are more mobile than non-central characters and that women as characters tend to occupy more interior space than men. Overall, our work is the first step towards joint modeling and analysis of characters and places in narrative text.
more » « less
Full Text Available

« Prev Next »

Search for: All records